Resampling performance improvement and sparse aggregation columns support by IvoDD · Pull Request #3062 · man-group/ArcticDB

IvoDD · 2026-04-30T09:39:24Z

Reference Issues/PRs

Monday ref: 11679866800

Depends on PRs #3091 and #3110

Issues

There is complicated bucket hopping logic in three places: generate_output_index_column, generate_resampling_output_column, SortedAggregator::aggregate
The bucket hopping logic involves many branches with loads of checks

Changes (split per commit for easier review)

Adds C++ benchmarks which measures the CPU intensive part of resampling
Pure move of the generate_output_index_column to sorted_aggregation.cpp.
- This way all bucket hopping logic is in one place.
Construct a ResampleMapping in generate_output_index_column and use it directly in other methods.
- ResampleMapping just has a mapping from output_row to (start_column_index, start_column_offset), (end_column_index, end_column_offset).
- Resolves the 3 places with similar logic.
- Makes the implementation of sparse aggregation easier.
Use galloping search in generate_output_index_column to skip past all rows in a single bucket at once.
- Index column construction was the bottleneck: aggregation vectorises well but index iteration does not.
- Changes complexity from O(num_input_rows + num_buckets) to O(num_buckets × log(rows_per_bucket)).
- Always ≤ O(num_input_rows + num_buckets) even when num_buckets ≥ num_input_rows.
Preallocate the output index column to min(num_buckets, num_input_rows) instead of num_buckets.
- Galloping search has a higher constant than linear scan and regresses at low rows per bucket.
- Slightly improves the case where most buckets are empty due to smaller allocation.
Use a runtime heuristic to choose between linear scan and galloping search.
- Linear scan is faster below ~32 rows/bucket (because of smaller constant and better branch prediction); galloping search is faster above.
- Threshold determined empirically from benchmarks at intermediate bucket counts. Extra benchmarking was done with more parametrization of the existing benchmark. Not kept in PR to avoid a huge amount of benchmarking code.
- Recovers the Dense-100k and Empty regressions from commit 3 while retaining all gains elsewhere.
Implement sparse resampling.
- Small change made straightforward by the ResampleMapping from commit 2.
- Minimal overhead for the dense case.

Resample benchmark timings

BM_resample/<rows_per_seg>/<num_segs>/<num_buckets>/<num_cols>. Total rows ~1M.
Source: cpp/arcticdb/processing/test/benchmark_resample.cpp. Times in ms, --benchmark_min_time=2s.

Regime	Args	rows/bucket	Description
Dense-1k	`100k × 10, 1k buckets`	~1000	Many rows/bucket, single row-slice
Dense-100	`100k × 10, 10k buckets`	~100	Medium rows/bucket, single row-slice
Dense-10	`100k × 10, 100k buckets`	~10	Few rows/bucket, single row-slice
Spanning	`2k × 500, 100 buckets`	~10k	Buckets span multiple row-slices
Empty	`100k × 10, 10M buckets`	<1	Bucket smaller than row spacing; most empty

1 aggregation column

#	Change	D-1k	D-100	D-10	Spanning	Empty
0	Baseline	1.27	1.34	1.47	1.65	11.1
1	Code move	1.02 (−20%)	1.12 (−16%)	1.27 (−14%)	1.40 (−15%)	11.1 (0%)
2	ResampleMapping	1.02 (−20%)	1.12 (−16%)	1.32 (−10%)	1.40 (−15%)	11.8 (+6%)
3	Galloping search	0.059 (−95%)	0.385 (−71%)	2.94 (+100%)	0.285 (−83%)	21.9 (+97%)
4	Bounded allocation	0.058 (−95%)	0.396 (−70%)	2.91 (+98%)	0.291 (−82%)	21.5 (+94%)
5	Heuristic (lin/EUB)	0.059 (−95%)	0.383 (−71%)	1.27 (−14%)	0.293 (−82%)	11.5 (+4%)
6	Sparse-input support	0.068 (−95%)	0.449 (−66%)	1.28 (−13%)	0.296 (−82%)	11.5 (+4%)

100 aggregation columns

#	Change	D-1k	D-100	D-10	Spanning	Empty
0	Baseline	1.37	1.43	1.56	6.22	48.0
1	Code move	1.11 (−19%)	1.18 (−17%)	1.34 (−14%)	5.92 (−5%)	46.2 (−4%)
2	ResampleMapping	1.11 (−19%)	1.19 (−17%)	1.39 (−11%)	5.87 (−6%)	50.4 (+5%)
3	Galloping search	0.148 (−89%)	0.471 (−67%)	2.96 (+90%)	4.65 (−25%)	63.1 (+31%)
4	Bounded allocation	0.148 (−89%)	0.480 (−66%)	2.95 (+89%)	4.67 (−25%)	44.1 (−8%)
5	Heuristic (lin/EUB)	0.149 (−89%)	0.477 (−67%)	1.33 (−15%)	4.70 (−24%)	35.9 (−25%)
6	Sparse-input support	0.158 (−88%)	0.537 (−62%)	1.35 (−13%)	4.94 (−21%)	36.0 (−25%)

Deltas vs baseline (row 0).

Notes on benchmark results

Load average varied across runs so there are some artifacts in results like "Code move" improvements.
Galloping search improves the speed when there are more rows in a single bucket significantly. Thorough benchmarking showed exponential upper bound (EUB) becomes faster than linear search at ~32 rows per bucket. Hence we see some performance regressions in the 10 rows per bucket and in the mostly empty bucket cases.
Bounded allocation mostly helps the empty case as expected
Using the heuristic to choose between EUB and linear search helps when rows_per_bucket < 32. It is even more efficient than the baseline due to slightly better branch prediction (improved use of ARCTICDB_LIKELY and ARCTICDB_UNLIKELY).
Final state: every regime at or faster than baseline; Dense 1000 rows per bucket is the biggest winner with 20x improvement; Mostly empty bucket is the only usecase with no improvement and remains around baseline (+4%)

claude · 2026-05-14T13:54:39Z

ArcticDB Code Review Summary

Delta since last review is two new commits on the rebased branch (8d9a8ff "Sparse resample hypothesis test", 25f608d "Fix resampling sparse docs"). The 54-file BEFORE..AFTER range is dominated by rebase noise from master; the genuinely new PR content touches only 4 files (_store.py, library.py, processing.py, test_arrow_sparse.py). All changes are correct and low-risk.

Documentation

The obsolete "resampling does not yet support sparse data" warnings in compact_incomplete/compact docstrings (_store.py, library.py) and the QueryBuilder.resample/agg docstrings (processing.py) have been removed, aligning user-facing docs with the new sparse-aggregation support. This resolves the previously-flagged documentation gap.
The Claude-maintained technical docs (docs/claude/cpp/PROCESSING.md, docs/claude/python/QUERY_PROCESSING.md) do not contain contradictory statements, so no further doc action is required for this delta.

Tests

New test_sparse_polars_resample_hypothesis exercises sparse float resampling across all aggregation ops against polars - good coverage for the new feature.
_polars_agg_expr correctly hoisted from a TestSparseArrowResample staticmethod to module scope; all call sites updated (no stale self._polars_agg_expr references remain).

Notes (no action required)

This PR is stacked and was rebased onto a newer master (which now contains the compact_data/arithmetic-promotion/pow changes that appear as noise in the raw delta) - merge order still matters.

#### Reference Issues/PRs Optimizations on top of #3091 Used in #3062 #### What does this implement or fix? Some micro optimizations on binary search methods: - Don't keep `TypedBlockData` in `ColumnDataIterator`. Instead only keep `block_data_` and `block_size_` - Don't recalculate block pointer and size when we already know them during gallop #### Any other comments? Benchmarks for all search and iteration methods: | Benchmark | Before (ns) | After (ns) | Delta | |---|---:|---:|---:| | iterate_irregular_blocks_1 (one row per block) | 478,496 | 311,163 | −35.0% | | iterate_with_iterator (100 rows) | 798 | 719 | −9.9% | | exponential_lb_single_block (in first 100) | 356 | 323 | −9.2% | | exponential_lb_single_block (full gallop) | 458 | 424 | −7.4% | | exponential_lb_regular (in first 100) | 364 | 339 | −6.7% | | exponential_lb_irregular_1000 (in first 100) | 360 | 335 | −6.7% | | exponential_lb_irregular_1000 (full gallop) | 496 | 476 | −3.9% | | exponential_lb_regular (full gallop) | 504 | 489 | −2.9% | | exponential_lb_irregular_1 (in first 100) | 464 | 455 | −2.0% | | exponential_lb_irregular_1 (full gallop) | 687 | 679 | −1.3% | | lower_bound_single_block | 411 | 394 | −4.1% | | lower_bound_irregular_1000 | 444 | 431 | −3.0% | | lower_bound_irregular_1 | 595 | 579 | −2.8% | | lower_bound_regular_blocks | 443 | 436 | −1.4% | | iterate_single_block | 27,305 | 27,247 | −0.2% | | iterate_regular_blocks | 29,051 | 28,734 | −1.1% | | iterate_irregular_blocks_1000 | 28,136 | 27,893 | −0.9% | | iterate_with_scalar_at (100 rows) | 182,183,122 | 182,088,026 | −0.1% | #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details>  Co-authored-by: Ivo <ivo.dilov@man.com>

vasil-pashov · 2026-06-01T07:23:15Z

It would be nice if we can get hypothesis tests covering some basic scenarios against polars, no need to test all supported parameters as some are quite painful to test.

vasil-pashov · 2026-06-01T07:38:13Z

+            size_t{0},
+            [](size_t acc, const auto& col) { return acc + col->row_count(); }
+    );
+    const auto max_output_rows = std::min(bucket_boundaries.size() - 1, total_input_rows);


How can you end up with bucket_boundaries.size() - 1 > total_input_rows

If we have loads of empty buckets. E.g. use resample("1h") on a table which has a 24h frequency like 2026-01-01, 2026-01-02, 2026-01-03

Previously each of `generate_output_index_column`, `generate_resample_output_column` and `aggregate` had complicated logic to identify which row corresponds to which output column. This is simplified by creating a `ResampleMapping` when building the output index column to store which output row corresponds to which input values. Then `ResampleMapping` is used in the other methods.

A lot of resampling runtime was spent during generation of output index column. This can be sped up significantly in the common case where number of buckets is much smaller then input rows by using exponential binary search.

Helps speed up and decrease memory usage for the very rare case where num_buckets >> num_input_rows.

With benchmarking of various rows_per_bucket it was confirmed that exponential_search becomes faster than linear scan at around 32 elements. For <32 rows per bucket the linear pass is faster. For >32 the exponential search is faster.

Construct output agg column based on rs_index of input sparse columns. Then use sparse iterators to populate the values.

IvoDD · 2026-06-04T06:51:17Z

The time_compact_data benchmark failure is unrelated. ColumnDataIterator changes were from a previous PR.

IvoDD changed the base branch from master to arrow-use-in-memory-storage-for-unit-tests April 30, 2026 10:17

maxim-morozov self-requested a review April 30, 2026 16:42

IvoDD force-pushed the arrow-use-in-memory-storage-for-unit-tests branch from 419c30a to 0de92a2 Compare May 5, 2026 14:11

Base automatically changed from arrow-use-in-memory-storage-for-unit-tests to master May 7, 2026 11:30

IvoDD force-pushed the sparse-resampling-support branch from a5ac868 to a9e8ee4 Compare May 11, 2026 09:18

IvoDD changed the base branch from master to binary-search-utils May 11, 2026 09:18

IvoDD force-pushed the sparse-resampling-support branch 2 times, most recently from 36122bc to 4231a4f Compare May 12, 2026 15:18

IvoDD force-pushed the binary-search-utils branch 2 times, most recently from 5679aa0 to 4b7e881 Compare May 13, 2026 08:20

IvoDD force-pushed the sparse-resampling-support branch 3 times, most recently from 210a17b to 086284c Compare May 13, 2026 14:53

IvoDD mentioned this pull request May 14, 2026

Optimize binary search methods #3110

Merged

5 tasks

IvoDD force-pushed the sparse-resampling-support branch from 086284c to 5e4edb7 Compare May 14, 2026 12:10

IvoDD added the patch Small change, should increase patch version label May 14, 2026

IvoDD changed the title ~~[Draft] Sparse resampling support~~ Resampling performance improvement and sparse aggragation columns support May 14, 2026

IvoDD changed the base branch from binary-search-utils to binary-search-utils-optimization May 14, 2026 13:12

IvoDD marked this pull request as ready for review May 14, 2026 13:47

IvoDD requested review from alexowens90 and poodlewars as code owners May 14, 2026 13:47

IvoDD changed the title ~~Resampling performance improvement and sparse aggragation columns support~~ Resampling performance improvement and sparse aggregation columns support May 14, 2026